How to get the Memory size of a DataFrame in Pandas

# Table of ContentsHow to get the Memory size of a DataFrame in PandasIncluding the memory footprint of object dtype columns in the resultHow to get the Memory size of a DataFrame using sys.getsizeof()Get the memory size of a DataFrame using DataFrame.info()# How to get the Memory size of a DataFrame in Pandas

To get the memory size of a DataFrame in Pandas:

Use the DataFrame.memory_usage() method to get the number of bytes eachcolumn occupies.Call the sum() method on the result to get the total memory size of theDataFrame.main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})# Index128# Name 24# Date 24# dtype: int64print(df.memory_usage())print('-' * 50)print(df.memory_usage(index=True).sum()) # 👉️ 176

The code for this article is available on GitHub

The pandas.DataFrame method returns the memory usage of each column of theDataFrame in bytes.

You can use the index argument to specify if you want to include thecontribution of the index in the calculation.

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})# Index128# Name 24# Date 24# dtype: int64print(df.memory_usage(index=True))print('-' * 50)# Name24# Date24# dtype: int64print(df.memory_usage(index=False))

By default, the index argument is set to True, which means the memory usageof the DataFrame's index is included in the returned Series.

As shown in the code sample, if index is set to True, its memory consumption is the first row in the output.

To calculate the memory consumption of the entire DataFrame (in bytes), sumthe memory usage of all columns.

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})print(df.memory_usage(index=True).sum()) # 👉️ 176

The code for this article is available on GitHub

The DataFrame.sum() methodreturns the sum of the values over the requested axis.

The method is equivalent to numpy.sum().

# Including the memory footprint of object dtype columns in the result

If you want to include the memory footprint of objectdtype columnsin the result, set the deep argument to True when callingDataFrame.memory_usage().

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})print(df.memory_usage(deep=True).sum()) # 👉️ 514print(df.memory_usage(deep=False).sum()) # 👉️ 176

The code for this article is available on GitHub

If the deep argument is set to True, the calculation accounts for the fullusage of the contained in the DataFrame objects.

By default, the deep argument is set to True, so the memory footprint ofobject dtype columns is not included.

Here is an example of setting deep to True without chaining a sum() call.

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})# Index128# Name 185# Date 201# dtype: int64print(df.memory_usage(deep=True))print('-' * 50)# Index128# Name 24# Date 24# dtype: int64print(df.memory_usage(deep=False))

Passing deep=False is the same as not passing the argument at all becauseFalse is its default value.

# How to get the Memory size of a DataFrame using sys.getsizeof()

You can also use thesys.getsizeof()method to get the memory size of a DataFrame.

main.pyCopied!import sysimport pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})print(df.memory_usage(deep=True).sum()) # 👉️ 514print(sys.getsizeof(df)) # 👉️ 530The code for this article is available on GitHub

The method returns the size of the supplied object in bytes.

# Get the memory size of a DataFrame using DataFrame.info()

You can also use the DataFrame.info() method to get the memory size of aDataFrame.

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})# memory usage: 176.0+ bytesprint(df.info())The code for this article is available on GitHub

TheDataFrame.info()method prints a concise summary of a DataFrame.

You should be able to see the memory usage toward the end of the output.

You can also set the memory_usage argument to "deep" to include the memoryfootprint of object dtype columns.

main.pyCopied!import pandas as pddf = pd.DataFrame({'Name': ['Alice','Bobby','Carl'],'Date': ['2023-07-12','2023-08-23','2023-08-21']})# memory usage: 514.0 bytesprint(df.info(memory_usage='deep'))The code for this article is available on GitHub

If the deep argument is set to True, the calculation accounts for the fullusage of the contained in the DataFrame objects.

# Additional Resources

You can learn more about the related topics by checking out the followingtutorials:

Columns have mixed types. Specify dtype option on importConvert a Row to a Column Header in a Pandas DataFrameDrop Unnamed: 0 columns from a Pandas DataFrame in PythonIndexError: single positional indexer is out-of-bounds [Fix]AttributeError: Can only use .dt accessor with datetimelike valuesCount number of non-NaN values in each column of DataFrameAdd a column with incremental Numbers to a Pandas DataFrameUsecols do not match columns, columns expected but not foundValueError: Shape of passed values is X, indices imply YValueError: Length of values does not match length of indexHow to add a Level to Pandas MultiIndex in PythonConverting a Nested Dictionary to a Pandas DataFramePandas: Strip whitespace from Column Headers in DataFramePandas: Drop columns if Name contains a given StringHow to repeat Rows N times in a Pandas DataFrameHow to convert a Pandas DataFrame to a Markdown TableHow to remove Time from DateTime in Pandas [5 Ways]Pandas: Check if a Date is during the Weekend or WeekdayPandas: Find the percentage of Missing values in each ColumnCreate Date column from Year, Month and Day in PandasPandas ValueError: Cannot index with multidimensional keyValueError: Grouper for 'X' not 1-dimensional [Solved]Cannot subset columns with tuple with more than one elementCheck if all values in a Column are Equal in PandasPandas: Get Nth row or every Nth row in a DataFramePandas: Select first N or last N columns of DataFramePandas: Select Rows between two values in DataFramePandas: How to Filter a DataFrame by value countsNumPy: Get the indices of the N largest values in an Array

云奕文章网

How to get the Memory size of a DataFrame in Pandas

相关推荐：